Reinforcement learning for penalty avoiding policy making

نویسندگان

  • Kazuteru Miyazaki
  • Shigenobu Kobayashi
چکیده

Reinforcement Learning is a kind of machine learning. It aims to adapt an agent to a given environment with a clue to a reward. In general, the purpose of reinforcement learning system is to acquire an optimum policy that can maximize expected reward per an action. However, it is not always important for any environment. Especially, if we apply reinforcement learning system to engineering, we expect the agent to avoid all penalties. In Markov Decision Processes, we call a rule penalty if and only if it has a penalty or it can transit to a penalty state where it does not contribute to get any reward. After suppressing all penalty rules, we aim to make a rational policy whose expected reward per an action is larger than zero. In this paper, we propose the Penalty Avoiding Rational Policy Making algorithm that can suppress any penalty as stable as possible and get a reward constantly. By applying the algorithm to the tick-tack-toe, its e ectiveness is shown.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reinforcement Learning for Penalty Avoiding Policy Making and its Extensions and an Application to the Othello Game

The purpose of reinforcement learning system is to learn optimal policies in general. However, from the engineering point of view, it is useful and important to acquire not only optimal policies, but also penalty avoiding policies. In this paper, we are focused on formation of penalty avoiding policies based on the Penalty Avoiding Rational Policy Making algorithm [1]. In applying the algorithm...

متن کامل

Reinforcement Learning in 2-players Games

The purpose of reinforcement learning system is to learn an optimal policy in general. However, in 2players games such as the othello game, it is important to acquire a penalty avoiding policy. In this paper, we are focused on formation of penalty avoiding policies based on the Penalty Avoiding Rational Policy Making algorithm [2]. In applying it to large-scale problems, we are confronted with ...

متن کامل

Introduction of Fixed Mode States into Online Profit Sharing and Its Application to Waist Trajectory Generation of Biped Robot

In reinforcement learning of long-term tasks, learning efficiency may deteriorate when an agent’s probabilistic actions cause too many mistakes before task learning reaches its goal. The new type of state we propose – fixed mode – to which a normal state shifts if it has already received sufficient reward – chooses an action based on a greedy strategy, eliminating randomness of action selection...

متن کامل

Multiple-Target Reinforcement Learning with a Single Policy

We present a reinforcement learning approach to learning a single, non-hierarchical policy for multiple targets. In the context of a policy search method, we propose to define a parametrized policy as a function of both the state and the target. This allows for learning a single policy that can navigate the RL agent to different targets. Generalization to unseen targets is implicitly possible w...

متن کامل

Using Reinforcement Learning to Introduce Artificial Intelligence in the Cs Curriculum

There are many interesting topics in artificial intelligence that would be useful to stimulate student interest at various levels of the computer science curriculum. They can also be used to illustrate some basic concepts of computer science, such as arrays. One such topic is reinforcement learning – teaching a computer program how to play a game or traverse an environment using a system of rew...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000